Feature Extraction and Feature Selection: Reducing Data Complexity With Apache Spark

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Extraction and Feature Selection: Reducing Data Complexity with Apache Spark

Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cyber security threats and attacks while utilizing machine learning. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handli...

متن کامل

An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on huge datasets –both in number of instances and features–. The purpose of this work is to demonstrate that standard feature selection methods can be paralleli...

متن کامل

Feature Selection and Non-linear Feature Extraction

Feature extraction and feature selection are two important tasks in pattern recognition. Classiication algorithms like k-nearest neighbors, which are based on the assumption that patterns in the same class are close to each other and those in diierent classes are far apart (locality property), rely heavily on the quality of the features extracted from the input data. In this work, an objective ...

متن کامل

Massively Parallel Unsupervised Feature Selection on Spark

High dimensional data sets pose important challenges such as the curse of dimensionality and increased computational costs. Dimensionality reduction is therefore a crucial step for most data mining applications. Feature selection techniques allow us to achieve said reduction. However, it is nowadays common to deal with huge data sets, and most existing feature selection algorithms are designed ...

متن کامل

A Real-Time Electroencephalography Classification in Emotion Assessment Based on Synthetic Statistical-Frequency Feature Extraction and Feature Selection

Purpose: To assess three main emotions (happy, sad and calm) by various classifiers, using appropriate feature extraction and feature selection. Materials and Methods: In this study a combination of Power Spectral Density and a series of statistical features are proposed as statistical-frequency features. Next, a feature selection method from pattern recognition (PR) Tools is presented to e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: SSRN Electronic Journal

سال: 2017

ISSN: 1556-5068

DOI: 10.2139/ssrn.3432178